Goto

Collaborating Authors

 zap q-learning



Zap Q-Learning With Nonlinear Function Approximation

Neural Information Processing Systems

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.


Zap Q-Learning

Neural Information Processing Systems

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.



Review for NeurIPS paper: Zap Q-Learning With Nonlinear Function Approximation

Neural Information Processing Systems

Summary and Contributions: This paper introduces a version of Zap Q-learning that can be applied to arbitrary approximation architectures for Q-functions. Convergence analysis is undertaken, and a version of the algorithm with MLP function approximators is applied to several classical control tasks. POST-REBUTTAL ------------------------ I thank the authors for their response. I appreciate the comments around reorganisation of material, and clarification of some of the technical points I raised. There are two main concerns that I have with the paper that prevent me from strongly recommending acceptance, described below.


Review for NeurIPS paper: Zap Q-Learning With Nonlinear Function Approximation

Neural Information Processing Systems

The reviewers are generally supportive of the paper. They have provided some very useful feedback, and I highly encourage the authors to incorporate that feedback. Primarily, it would be ideal to complete the paper reorganization as discussed, explain the limitations in the assumption on boundedness of the iterates, provide a toy example where the boundness assumption is not on its own enough to prevent divergence of Q-learning (i.e, even under that assumption, Q-learning diverges but Zap-Q does not) and finally to sweep over the parameters in the empirical comparison (even if that means the outcome is less positive for Zap-Q).


Reviews: Zap Q-Learning

Neural Information Processing Systems

The paper proposes a variant of Q-learning, called Zap Q-learning, that is more stable than its precursor. Specifically, the authors show that, in the tabular case, their method minimises the asymptotic covariance of the parameter vector by applying approximate second-order updates based on the stochastic Newton-Raphson method. The behaviour of the algorithm is analised for the particular case of a tabular representation and experiments are presented showing the empirical performance of the method in its most general form. This is an interesting paper that addresses a core issue in RL. I have some comments regarding both its content and its presentation.


Zap Q-Learning With Nonlinear Function Approximation

Neural Information Processing Systems

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym.


Zap Q-Learning

Devraj, Adithya M, Meyn, Sean

Neural Information Processing Systems

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.